- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0004000000000000
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Yin, Qingyu (4)
-
Yin, Bing (3)
-
Jiang, Haoming (2)
-
Tang, Xianfeng (2)
-
Zhang, Chao (2)
-
Zhao, Tuo (2)
-
Feng, Rui (1)
-
Gao, Yifan (1)
-
He, Jingrui (1)
-
Jiang, Meng (1)
-
Jin, Bowen (1)
-
Li, Ruirui (1)
-
Li, Shiyang (1)
-
Li, Yichuan (1)
-
Li, Zheng (1)
-
Liu, Xin (1)
-
Lu, Hanqing (1)
-
Luo, Chen (1)
-
Sun, Jianhui (1)
-
Tan, Zhaoxuan (1)
-
- Filter by Editor
-
-
Chiruzzo, Luis (1)
-
Ritter, Alan (1)
-
Wang, Lu (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models’ ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs.more » « lessFree, publicly-accessible full text available April 27, 2026
-
Wei, Tianxin; Jin, Bowen; Li, Ruirui; Zeng, Hansi; Wang, Zhengyang; Sun, Jianhui; Yin, Qingyu; Lu, Hanqing; Wang, Suhang; He, Jingrui; et al (, ICLR)
-
Zuo, Simiao; Yin, Qingyu; Jiang, Haoming; Xi, Shaohui; Yin, Bing; Zhang, Chao; Zhao, Tuo (, Association for Computational Linguistics)
-
Feng, Rui; Luo, Chen; Yin, Qingyu; Yin, Bing; Zhao, Tuo; Zhang, Chao (, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)
An official website of the United States government

Full Text Available